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Abstract. We give a survey of a number of simple applications of 
renewal theory to problems on random strings and tries: insertion depth, 
size, insertion mode and imbalance of tries; variations for 6-tries and 
Patricia tries; Khodak and Tunstall codes. 



1. Introduction 

Although it long has been realized that renewal theory is a useful tool 
in the study of random strings and related structures, it has not always 
been used to its full potential. The purpose of the present paper is to give 
a survey presenting in a unified way some simple applications of renewal 
theory to a number of problems involving random strings, in particular 
several problems on tries, which are tree structures constructed from strings. 
(Other applications of renewal theory to problems on random trees are given 
in, e.g., and fli].) 

Since our purpose is to illustrate a method rather than to prove new 
results, we present a number of problems in a simple form without trying 
to be as general as possible. In particular, for simplicity we exclusively 
consider random strings in the alphabet {0, 1}, and assume that the "letters" 
(bits) £j in the strings are i.i.d. Note, however, that the methods below 
are much more widely applicable and extend in a straightforward way to 
larger alphabets. The methods also, at least in principle, e xtend to, for 



example, Markov sources where & is a Markov chain. (See e.g. ISzpankowski 
[32I Section 2.1] and Clement, Flajolet and Vallee 0] for various interesting 
probability models of random strings. Renewal theory for Markov chains is 
treated for example by Kesten 21] and Athreya, McDonald and Ney 0].) 
Indeed, one of the purposes of this paper is to make propaganda for the use 
of renewal theory to study e.g. Markov models, even if we do not do this in 
the present paper. (Some such results may appear elsewhere.) 

The results below are (mostly) not new; they have earlier been proved by 
other methods, in particular Mellin transforms. (We try to give proper ref- 
erences for the theorems, but we do not attempt to cover the large literature 
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on random tries and strings in any completeness.) Indeed, such methods of- 
ten provide sharper results, with better error bounds or higher order terms, 
and these methods too certainly are important. Nevertheless, we believe 
that renewal theory often is a valuable method that yields the leading terms 
in a simple and intuitive way, and that it ought to be more widely used 
for this type of problems. Moreover, as said above, this method may be 
easier to extend to other situations. (Further, it gives one explanation for 
the oscillatory terms that often appear, as an instance of the arithmetic case 
in renewal theory. Note that oscillatory terms become much less common 
for larger alphabets, except when all letters are equiprobable, because it is 
more difficult to be arithmetic, see Appendix lAl) 

We treat a number of problems on random tries in Sections [3H5] and [8] 
(insertion depth, imbalance, size, insertion mode). We consider 6-tries in 
Section [6] and Patricia tries in Section [7J Tunstall and Khodak codes are 
studied in Section [9j A random walk in a region bounded by two crossing 
lines is studied in Section [TUJ. The standard results from renewal theory that 
we use are for convenience collected in Appendix [Al 

Notation. We use — and -—* for convergence in probability and in dis- 
tribution, respectively. 

If Z n is a sequence of random variables and [i n and cr 2 are sequences of 
real numbers with <r 2 > (for large n, at least), then Z n ~ AsN(/x n , cr 2 ) 

means that (Z n — ^L n )/a n -—>■ N(0, 1). 

We denote the fractional part of a real number x by {x} := x — \_x\. 

Acknowledgement. I thank Allan Gut and Wojciech Szpankowski for in- 
spiration and helpful discussions. 

2. Preliminaries 

Suppose that H^ 2 - 1 , ... is an i.i.d. sequence of random infinite strings 
3 (n) = W ith letters in an alphabet A. (When the superscript 

n does not matter we drop it; we thus write 3 = £x$2 ■ ■ ■ for a generic string 
in the sequence.) For simplicity, we consider only the case A = {0, 1}, and 
further assume that the individual letters & are i.i.d. with £j ~ Be(p) for 
some fixed p G (0, 1), i.e., P(& = l)=p and P(& = 0) = q := 1 - p. 

Given a finite string a\ ■ ■ ■ a n £ A n , let P{ot\ ■ ■ ■ a n ) be the probability 
that the random string H begins with a\ ■ ■ ■ a n . In particular, for a single 
letter, P(0) = q and -P(l) = p, and in general 

n n 

P(ai ■■■a n ) = Y[P(ai) = [^pV^. (2-1) 
Given a random string £x$2 ■ ■ ■ , we define 

X i :=-lnP(e i ) = -ln(^g 1 ^) = (:J n9 ' ** Z ?' ( 2 - 2 ) 
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Note that X\, X 2 , ■ ■ ■ is an i.i.d. sequence of positive random variables with 
EXi = H := -plnp-qlnq, (2.3) 

the usual entropy of each letter £j, and 

EX? = H 2 := pin 2 p + qln 2 q, (2.4) 
Var Xi =H 2 -H 2 = pq{\np - Inq) 2 = pq\n 2 {p/q). (2.5) 

Note that the case p = q = 1/2 is special; in this case Xi = In 2 is determin- 
istic and Var Xi = 0; for all other p £ (0, 1), < Var Xi < 00. 

By (|2,2p . Xi is supported on {ln(l/p), ln(l/q)}. It is well-known, both 
in renewal theory and in the analysis of tries, that one frequently has to 
distinguish between two cases: the arithmetic (or lattice) case when the 
support is a subset of dTL for some d > 0, and the non- arithmetic (or non- 
lattice) case when it is not, see further Appendix O For Xi given by (|2.2p . 
this yields the following cases: 

arithmetic: The ratio lnp/\nq is rational. More precisely, Xi then is 
d-arithmetic, where d equals gcd(lnp, Inq), the largest positive real 
number such that Inp and Inq both are integer multiples of d. If 
Inp/lnq = a/b, where a and b are relatively prime positive integers, 
then 

d = gcd(lnp, Inq) = - — — = - — (2-6) 
a b 

non-arithmetic: The ratio lnp/ In q is irrational. 
We let S n denote the partial sums of Xf S n := J27=i -^-i- Thus 

n n 

p(ti = n p &) = n e ~ Xi = e ~ Sn ■ ( 2 - 7 ) 

(This is a random variable, since it depends on the random string £1 • • • £ n ; it 
can be interpreted as the probability that another random string begins 
with the same n letters as observed.) 

We introduce the standard renewal theory notations (see e.g. iGutl 131 . 
Chapter 2]), for t > and n > 1, 

u(t) := min{n : S n > t}, (2.8) 
F n (t) := P{S n <t)= ¥{u{t) > n), (2.9) 



U(t) :=Eu(t) = Y / F n {t). (2.10) 

n=0 

Note that (|2.10p means that, for any function g > 0, 

/■OO 00 POO 00 

/ g(t)dU(t) = J2 ff(t)dF n (i) = VE ff (5 n ). (2.11) 



n=0 " u n=0 
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We also allow the summation to start with an initial random variable Xq, 
which is independent of Xi,Xz, . . . , but may have an arbitrary real- valued 
distribution. We then define 

oo oo 
n=0 n=l 

d(t) : = min{n : S n > t}. (2.13) 

3. Insertion depth in a trie 

A trie is a binary tree structure designed to store a set of strings. It is 
con structed from the strings by the follo wing recursive pro cedure, see fur ther 
e.g. Knuthl (22I . Section 6.3], iMahmoud 25l . Chapter 5] or Szpankowski 32, 



Section 1.1]: If the set of strings is empty, then the trie is empty; if there is 
only one string, then the trie consists of a single node (the root), and the 
string is stored there; if there is more than one string, then the trie begins 
with a root, without any string stored, all strings that begin with are 
passed to the left subtree of the root, and all strings that begin with 1 are 
passed to the right subtree. In the latter case, the subtrees are constructed 
recursively by the same procedure, with the only difference that at the kth 
level, the strings are partitioned according to the kth letter. We assume that 
the strings are distinct (in our random model, this holds with probability 
1), and then the procedure terminates. Note that one string is stored in 
each leaf of the trie, and that no strings are stored in the remaining nodes. 
The leaves are also called external nodes and the remaining nodes are called 
internal nodes; note that every internal node has one or two children. 

The trie is a finite subtree of the complete infinite binary tree 7^, where 
the nodes can be labelled by finite strings a = a\ ■ ■ ■ G A* := [j'^ 3 =0 A k 
(the root is the empty string). It is easily seen that a node ot\ - ■ ■ at in 
is an internal node of the trie if and only if there are at least 2 strings (in 
the given set) that start with a\ ■ ■ ■ a^, and (for k > 1) that a\ ■ ■ ■ is an 
external node if and only if there is exactly one such string, and there is at 
least one other string beginning with ct\ ■ ■ ■ (Xk-i- 

Let D n be the depth (= path length) of the node containing a given 
string, for example the first, in the trie constructed from n random strings 
Ea 1 ), . . . , E( n ). (By symmetry, any of n strings will have a depth with the 
same distribution.) Denoting the chosen string by H = ^1^2 • • • , the depth 
D n is thus at most k if and only if no other of the strings begins with Ci " ' Cifc- 
Conditioning on the string E, each of the other strings has this beginning 
with probability • • • and thus by independence, recalling (j2Z 



< k I H) = (1 - P(6 • • • 6))"" 1 = (1 - e- 5 *) n_1 . (3.1) 

(n) 

Let Xq = Xq be a random variable with the distribution 

P(X<; n) > x) = (1 - e x jn) n ~ X = (1 - e*- 1 ™)" -1 , x G (-00, 00). (3.2) 
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As n — > oo, this converges to exp( 
has the Gumbel distribution with 



and thus X, 



Xq, where 



-XX 



-Xq < x) = exp(— exp(— x)). 



,(n) d_ 



Inn — max-fZi , . . . , ), where 



Remark 3.1. It is easily seen that X 
Z-\ . Z%, . . . are i.i.d. Exp(l) random variables. Cf. lLeadbetter. Lindgren and Rootzen 
[ll, Example 1.7.2]. 



Using (|3.2p . we can rewrite (|3.ip as 



P(A* < k 



X { n) >\nn-S k | S) 



(3.3) 



and thus, recalling (j2~12l) and (f2TT3|> . 

P(£>„ <k) = P(X > Inn - S k ) = P(S fc > Inn) 
Since k > 1 is arbitrary, this shows that 

D n = z/(ln n). 



D (P(lnn) < k). (3.4) 



(3.5) 



In the case p = 1/2, = /cln2 is non-random, and the only randomness 
in z?(lnn) comes from Xq; in fact, it is easy to see that P(-D n < fc) — > 
P(— Xq < t) if k — > oo and n — > oo along sequences such that In 2 — In n — > 
i G (-00,00), see [3], (HI, [H, Theorem 5.7], [11]. This result can also be 
expressed as dTv(D n , [(Inn— Xq)/ In 2]) — > as n — > 00, where g?tv denotes 
the total variation distance of the distributions, see HH Example 4.5]. 

However, if p 7^ 1/2, then each X k is truly random, which leads to larger 
dispersion of D n . We can apply standard renewal theory theorems, see The- 
orems IA.lHA.3l and Remark IA.4I in the a ppendix , and immediately obtain 
the following:. For other, earlier p roofs see iKnuthl [H, Sections 6.3 and 5.2], 
Pittel 27,[23] and Mahmoudl 2a, Section 5.5]. The Markov case is treated 
by Jacquet and Szpankowski [17J, ergodic strings by Pittel 271, and a class 
of general dynamical sources by Clement, Flajolet and Vallee [5]. 



Theorem 3.2. For every p £ (0, 1), 

At P; l 

Inn H 

with H the entropy given by (12.3D . Moreover, the convergence holds in every 
L r , r < 00, too. Hence, all moments converge in (13. 6h and 



(3.6) 



H- r {\nn) 



< r < 00. 



Theorem 3.3. More precisely: 

(i) If lnp/lng is irrational, then, as n 



00, 



„ „ Inn Ho 7 . . 

(ii) If lnp/lng is rational, then, as n — > 00, 

VD n = ^ + ^ + -X + ^(lnn) + o(l), 



(3.7) 



(3. 



(3.9) 
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where ipi(t) is a small continuous function, with period d = gcd(lnp, Inq) in 
t, given by 

^(t) : = _l^r(-27riA;/d)e 2 " ifc * /d . (3.10) 
Proof. The non-arithmetic case (|3.8p follows directly from (|3.5p and (|A.4D ; 

in) 

we can replace Xq by the limit ACq, and since the Gumbel variable — Xq 
has characteristic function Ke~ ltx ° = T(l — it), we have KXq = T'(l) = —7. 

In the arithmetic case, we use (|A.6p . together with Lemma IA.5I which 
yields 

ft = l__yF(l-2nik/d) kt/d = \ + }_y n _ 2lTik/d) ^H /d _ 

id d J 2 ^ 2vr£;i 2 d ^ 1 ' y 

□ 

Theorem 3.4. Suppose that p £ (0, 1). Then, as n — > oo ; 

Dn-if^lnn d ^ o 2 ^ 



wii/i cr 2 = H2 — H 2 = pq(lnp — lng) 2 . If p 7^ 1/2, then a 2 > and this can 
be written as 



D n ~ AsN (F" 1 In n, ff - V In n) . 



Moreover, 



a 2 



Var D ra = —7 In n + o(ln n) . 
H 6 



In the argument above, Xq depends on n. This is a nuisance, although 
no real problem (see Remark IA.4p . An alternative that avoids this problem 
is to Poissonize by considering a random number of strings. In this case 
it is simplest to consider 1 + Po(A) strings, so that a selected string 3 is 
compared to a Poisson number Po(A) other strings, for a parameter A — * 00. 
Conditioned on S, the number of other strings beginning with £1 • • • then 
has the Poisson distribution Po(AP(£i • • •£&))• Thus we obtain instead of 
(I3.ip . now denoting the depth by D\, 

F(D X <k\E) = e -^fci-&) = e- Xe ~ Sk = e ^ 8 ^ x) 

= P(-X * < S k - In A) = ¥(S k + Xq* > In A) = P(P(ln A) < k) , 

where Xq := Xq now is independent of n, and consequently = z?(lnA). 
We obtain the same asymptotics as for D n above, directly from Theorems 
IA.lHA.3l It is in this case easy to depoissonize, by noting that D n is stochas- 
tically monotone in n, and derive the results for D n from the results for D\ 
by choosing A = n ± n 2 / 3 ; we omit the details. 
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4. Imbalance in tries 

Mahmoud [26|] studied the imbalance factor of a string in a trie, defined 
as the number of steps to the right minus the number of steps to the left in 
the path from the root to the leaf where the string is stored. We define 

Y t := 2& - 1 = 

and denote the corresponding partial sums by Vk := Yli=x Yi- Thus the 
imbalance factor A n of the string S in a random trie with n strings is Vr> n , 
with D n as in Section [3] the depth of the string. 

It follows immediately from (|3.3|) that (|3.4p holds also conditioned on the 
sequence (Yi, 5^, . . . ). As a consequence, for any k and v, 

¥(D n = k\V k = v)= P(P(lnn) = k\V k = v), 

which shows that 

(D n ,A n ) = (D n ,V Dn ) = (uQnn), V~ ( i nn) ). 

In particular, 

A n = Vp(i nn ). 

We may apply Theorem IA.8I (and Remark IA.9D . A simple calculation yields 
Var(/Zjjf5 / i — HyX\) = ^(lnp + lng) 2 = pgln 2 (pg), and we obtain the central 
limit theorem by Mahmoud [25l |: 

Theorem 4.1. ^4s n — > oo, 

A n ~ AsN I — ^— In n, ^ In n 



5. The expected size of a trie 

A trie built of n strings as in Section has n external nodes, since each 
external node contains exactly one string. However, the number of internal 
nodes, W n , say, is random. We will study its expectation. For simplicity we 
Poissonize directly and consider a trie constructed from Po(A) strings; we 
let W\ be the number of internal nodes. The results below have previously 
been found by other methods, in particular, more precise asymptotics have 
been found using Mellin transforms; see Knuth [22], Mahmoud [2^], Fayolle, 
Flajolet, Hofri and Jacquet [hJ], and, in particular, Jacquet and Regnier 
[HI, EH . The Markov case is studied by Regnier [3(3] and dynamical sources 
by Clement, Flajolet and Vallee [3]. 

If a = a\ ■ ■ ■ otk is a finite string, let I{a) be the indicator of the event 
that a is an internal node in the trie. We found above that this event occurs 
if and only if there are at least two strings beginning with a. In our Poisson 
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model, the number of strings beginning with a has a Poisson distribution 
Po(AP(q)), and thus 

EW X = E/ («) = E p ( p o(AP(ct)) >2)= ^ /(AP(«)), (5.1) 
where 

/(x) := P(Po(x) > 2) = 1 - (1 + x)e~ x . (5.2) 

Sums of the type in (|5.ip are often studied using Mellin transform inver- 
sion and residue calculus. Renewal theory presents an alternative. As said 
in the introduction, this opens the way to straightforward generalizations, 
e.g. to Markov sources. 

Theorem 5.1. Suppose that f is a non-negative function on (0, oo), and 
that F(X) = Y2a£A* f(^P( a ))> with P(ot) given by (|2.ip . Assume further 
that f is a.e. continuous and satisfies the estimates 

f(x)=0(x 2 ), 0<x<l, and f(x) = 0(l), 1< x < oo. (5.3) 

Letg(t) :=e*/(e-*). 

(i) If Inp/lnq is irrational, then, as A —* oo, 



F(\) 



1 f'OC -| f'OG 

jj J 9(t)dt = - J f(x)x~ 2 dx. (5.4) 



A 

(ii) If Inp/lnq is rational, then, as A — > oo, 
F(A) 1 



A 



^(lnA)+o(l), (5.5) 



m=— oo 



where, with d := gcd(lnj>, lng) gwen 6y (|2.6p . ip is a bounded d-periodic 
function having the Fourier series 

oo 

^(t)~ ^ $(m)e 27rimt/d (5.6) 

^( m ) =g(-2irm/d) = / e 27limt / d g(t) dt = j f{x) X - 2 - 2 ^ m ' d dx. 

(5.7) 



Furthermore, 

oo 

^(t) = d 53 9(^-t). (5.8) 

fe=— oo 

If f is continuous, then tp is too. 
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Proof. If fo(a.) is any non- negative function on A*, then, using (12. 7p . for 
each k > 0, 

E fl n /o( a l " " a k) T3( \ 

M«i -«*)= L P (ai -a fc ) P(ai - afc) 



E p|?"fl =E ( e Hfe-&)), 



and thus, 



^ /o(a) = 5>( e s */o(£i- •■&))■ (5-9) 

With / (a) = /(AP(a)), we have /„(& ■ ■ ■ £fc) = /(Ae" 5fe ) and thus 
yields, recalling (|2.1U|) . 



°° />oo 

F(A) = £ /(AP(a)) = X)E(^V(Ae-*)) = / /(Ae 
ae^ s fc=o ^° 

Define further f\{x) := f(x)/x; thus g(t) = f\(e~ l ). Then, 



e~ x )e x dU{x). 



F(X)= Xfi(Xe~ x ) dU(x) = A / - In A) df7(x). (5.10) 



We can now apply the key renewal theorem, Theorem IA.7I The function 
g is a.e. continuous and it follows from (|5.3p that g(t) < Ce~'*' for some 
C; hence g is directly Riemann integrable on (—00,00) by Lemma lA.61 In 
the non-arithmetic case (i) we obtain ([531) from (IBTTUI) and ([A"T0]) . since 
M = EI; = H by d23|) and, with x = e _t , 



/oo roc roo 

g(t)dt= / e */(e- i )dt= / /(x)x" 2 dx. (5.11) 
-oo J —oo JO 

Similarly, the aritmetic case (ii) follows from (IA.12j) and (IA,14j) - (IA.16j) 
together with the calculation, generalizing (|5.1ip . 

poo roo poo 

g(s)= e- ist g(t)dt= e^-^/Ce - *) dt = / f(x) X - 2+is dx. 



oo J — oo 



(This equals the Mellin transform /(— 1 + is).) □ 

Remark 5.2. The assumptions on / may be weakened (with the same 
proof); it suffices that f(x) = 0(x 1 ^ s ) and f(x) = 0(x 1+s ) for x G (0, oo) 
and some 5 > 0. If / is continuous, it is obviously sufficient that these 
estimates hold for small and large x, respectively. 

Returning to W\, we obtain the following for the expected number of 
internal nodes in the Poisson trie. 

Theorem 5.3. (i) If Inp/lnq is irrational, then, as A —* oo, 
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(ii) If lnp/lng is rational, then, as A — > oo ; 

EW\ 1 1 

^ = - + -^ 2 (lnA)+ (l), (5.13) 

where, with al = gcd(lnp, In g), t/>2 a continuous d-periodic function with 
average and Fourier expansion 

j. (f \ = V r ( X - 27rifc / d ) 2nikt/d = ST- 27rife / _ 27rifc\ 27rifct/d 
mj V 1 + 27rifc/<i ^ d V d / 

fc^O 7 fc^O 

Proo/. We apply Theorem O to flED. It follows from that /'(x) = 
xe~ x . Thus, by an integration by parts, since f(x)/x — > as x — > and 

x — ► 00, 

/■OO /'OO /"OO 

/ /(x)x~ 2 dx = / f(x)x~ 1 dx= / e" x dx = l. (5.14) 
io ^0 ./0 



Consequently, (|5.12p follows from ()5.4|) . 

Similarly, (|5.13[) follows from (|5.5|) . and the calculation, generalizing (|5.14p . 

poo poo 
g( S ) = / f( X )x- 2+is dx = (1 - is)" 1 / /'(x)x- 1+iS dx 

Jo Jo 

= ^^ = -isr ( -i + is). □ 

1 — IS 

The case of a fixed number n of strings is easily handled by comparison, 
and (|5,12p and (|5.13p imply the corresponding results for W n : 

Theorem 5.4. (i) If lnp/lng is irrational, then, as n — > 00, 

EW n 1 
~~n " #' 

(ii) 7f lnp/lng is rational, then, as n — > 00, with ip2 cts in Theorem 15. 31 

^4 + I wln „ )+o(1) . 

Proof. EW n is increasing in n. Thus, first, because P(Po(2n) > n) > 1/2, 

EW^n > I EW" n , and thus EW„ < 2E^ 2n = 0{n). Secondly, using this 
estimate, the standard Chernoff concentration bounds for the Poisson dis- 
tribution easily implies, with A± = n ± n 2//3 , say, ETy^_ + o(n) < EW n < 
EW^ + + o(n). The results then follow from Theorem 15.31 □ 

Remark 5.5. It is well-known that the periodic function ip2 above, as in 
many similar results, fluctuates very little from its mean. In fact, the largest 
d is obtained for p = q = 1/2, when d = In 2. Since r(l+is) decreases rapidly 
as s — ► ±00, the Fourier coefficients of ip2(t) are very small; the largest (in 
absolute value) are hfe(±l)| = |r(l + 2vri/ In 2)|/|1 - 2vri/ln2| r* 0.542 ■ 
1(T 6 , so |^2 (In n)\ is at most about 10 6 , and the oscillations ip2 (In n) /H of 
KW n /n are bounded by 1.6- 10~ 6 . (See for example [H, pp. 23-28].) Other 
choices of p yield even smaller oscillations. 
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6. 6-TRIES 

As a variation, consider a 6-trie, where each node can store b strings, for 
some fixed integer b > 1; as before, the internal nodes do not contain any 
string. A finite string a now is an internal node if and only if at least b+1 
of the strings start with a. In the argument above we only have to replace 
(E2Dby 

f{x) := P(Po(x) > b+ 1); (6.1) 

thus f'(x) = P(Po(x) = b) = x b e~ x /b\ and (|5.1ip yields, with an integra- 
tion by parts as in (jfTTIj) . /!^0(t)dt = l/b. Hence, in the non- arithmetic 
case when Inp/lnq is irrational, the expected number of internal nodes is 
E Wj[ ~ A/ (H b) , as found by Jacquet and Regnier [HI, [lf| . In the arith- 
metic case, we obtain a periodic function ip, now with Fourier coefficients 
(1 + 27ri£:/cfr 1 r(6 - 2mk/d)/b\. 

We can also analyze the external nodes. Let Zj be the number of nodes 
where exactly j strings are stored, j = 1, ... ,b. A finite string a is one 
of these nodes if exactly j of the stored strings begin with a, and at least 
b — j + 1 other strings begin with a' , the sibling of a obtained by flipping 
the last letter. (We assume that there are at least b strings, so we can ignore 
the root.) 

Consider again the Poisson model. In the case when a ends with 1, i.e., 
a = (31 for some (3, the probability of this event is, with x = XP((3), by 
independence in the Poisson model, P(Po(px) = j) F(Po(qx) > b — j). If 
a = (30, we similarly have the probability F(Po(qx) = j) F(Po(px) > b—j). 
Summing over (3 £ A*, we thus obtain a sum of the type in Theorem 15.11 
with / replaced by 



fj{x) = F(Vo{px) = j) F(Po(qx) > b - j) + W(Po(qx) = j) P(Po(px) > b - j) 

b-j 



y Uj —p X \ \ H „—qx 1,4*^ „ — qx \ 1 \ A I J ^ „—px 




k\ 



-e 



k=0 

= phf_ e _ px q^_ e - qx _ (p j q k + g j p k )xi +k c _ x 

7! 7! ^ j\k\ 

We argue as above, with gj(t) := e t fj(e~ t ). We have, similarly to (|5.1ip . 
omitting some details, 

00 

2 



gj(t)dt = / fj(x)x dx 
, Jo 

pln(l/p) + ?ln(V«) " £fc=l l(pq k + w k ), j = 1, 

.h£u - El'Jo ij± ff )1 (p j q k + <zV9, 2<j<b. 
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Alternatively, using 



r P 3 ^ -vx Q kxk -nx , Q jxj -ax ST~^ P**^ 

k=b-j+l ' J ' k=b-j+l 



e -px 



k=b-j+X 



we find 



c ^= E — (P ? 'g* + g f A 1<J'<6- ( 6 - 3 ) 

More generally (except when (j, s) = (1,0)), 
/ /i(x)x- 2+is dx 

J 

= r(J ~.; +i V - + ^- is ) - e r(i+fc ,:, 1+is V g fc + 

J - fc=0 J 

(6.4) 

If we use the notation Zj- n for the trie with a fixed number n of strings 
and Zj\ for the Poisson model with Po(A) strings, we obtain as above the 
following result for the number of external nodes that store j strings. 

Theorem 6.1. (i) If lnp/lnq is irrational, then, as n — > oo, for j = 
1,...,6, 

E Zj n Cj 
n ii 



mi/i Cj given by ()6.2p (|6.3|) . 

(ii) /f lnp/lng is rational, then, asm oo, /or j = 1, . . . , b, 

^=V-y(lnn) + (l), 
n 

where ipbj *s a continuous d-periodic function, with d as in Theorem I5.ffl ipbj 
has average irj and Fourier expansion 

oo 

^ bj (t) = H- 1 gj(-2ir[k/d)e 27rikt/d = irj + H' 1 ^5 j (-2vriA;/d)e 2 " iH/d , 

k=— oo fc^O 

uwi/i 5j given by (|6.4p . T/ie same results (with n replaced by X) hold for 
Zj\ in the Poisson model. 



Proof. As just said, the Poisson case follows from (15. lj) . and it remains only 
to depoissonize. To do this, choose A = n, and let iV ~ Po(n) be the number 
of strings in the Poisson model. We couple the trie with n strings and the 
Poisson trie with N strings by starting with min(n, N) common strings. If 
we add a new string to the trie, it is either stored in an existing leaf or it 
converts a leaf to an internal node and adds two new leafs (and possibly a 
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chain of further internal nodes). Thus at most 3 leaves are affected, and 
each Zj changes by at most 3. Since we add max(n, N) — min(n, N) = 
\N — n\ new strings, we have \Zj\ — Zj n \ < 3\N — n\ for each j, and thus 
\EZjx -EZ jn \ < 3E|JV-n| = 0(y/n). □ 



For example, for b = 2, 3, 4 we have the following limits in the non- 
aritmetic case, and up to small oscillations also in the aritmetic case: 



b 


7Tl 


7T 2 


7T3 


7T 4 


2 
3 
4 


i - Tim 


TlPV 

mPV 
mPi ~ ti(pi) 2 


mPi 
mP<i + Mpi) 2 





Note that YliJ^j = 1, or equivalently Yli3 c j = H, since the total number 
of strings in the leaves is n; this can also be verified from (16. 2j) , 

7. Patricia tries 

Another version of the trie is the Patricia trie, where the trie is compressed 
by eliminating all internal nodes with only one child. (We use the notations 
above with a superscript P for the Patricia case.) Since each internal node 
in the Patricia trie thus has exactly 2 children, the number of internal nodes 
is one less than the number of external nodes, i.e. = n — 1 for a Patricia 
trie with n strings. 

As another illustration of Theorem l5.ll we note that this trivial result, to 
the first order at least, also can be derived as above. The condition for a finite 
string a to be an internal node of the Patricia trie is that there is at least one 
string beginning with aO and at least one string beginning with cel. In the 
Poisson model, the number of strings with these beginnings are independent 
Poisson random variables with means AP(oiO) = XqP(a) and XP(al) = 
XpP(a), and we can argue as above with f(x) = (1 — e~ px )(l — e~ qx ). In 
this case, g{t) dt = J °° f(x)x~ 2 = —plnp — qlnq = H, which implies 

EWjT ~ A and EWjf ~ n in the non-arithmetic case. Moreover, we know 
that this holds in the arithmetic case too, without oscillations, which means 
that ip(m) = for m ^ in (I5.6p - (l5.7p . Indeed, for example by integration 
by parts, 

roo poo 

g( s )= / f(x)x- 2+is dx = / x- 2+is (l-e- px -e~ qx + e~ x )dx 
Jo Jo 

= {l-p 1 - is -q 1 - [s )T(-l + is), 

and thus ijj(m) = g(—2irm/d) = for m / 0. 

We can also consider a Patricia 6-trie, and obtain the asymptotics of the 
expected number of internal nodes in a similar way, but it is simpler to use 
the result in Theorem 16.11 and the fact that the number of internal nodes 
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is X)j=i ^fn ~ 1 = Sj=i Zjn — 1; in the non-arithmetic case this yields the 

asymptotics (Y^j=i 7T j) n - 

The number of internal nodes in the Patricia trie is reduced to n — 1 from 
about n/H in the trie (see Theorem 15.41 and ignore the small oscillations 
in the arithmetic case); this is a reduction by a factor H which is at most 
In 2 ~ 0.693, in other words a reduction with at least 30%. Nevertheless, 
the reduction in the path length to a given string is negligible. In fact, if we 
for simplicity, as in Section [31 consider 1 + Po(A) strings, with one selected 
string 3, then a string a is an internal node on the path in the trie from 
the root to 3 such that a does not appear in the Patricia trie if and only if 
3 begins with a, and further, either 3 begins with aO, there is at least one 
other such string, and there is no string beginning with ocl, or, conversely, 
3 and at least one other string begins with al but no string begins with 
aO. The probability of this is A _1 /(x) with x = XP(a) and 

f(x) := xq{l - e- qx )e- px + xp(l - e~ px )e- qx . 

Hence, if AD\ := D\ — is difference between the path lengths to 3 
in the trie and in the Patricia trie, then KAD\ = A -1 Yla f(^P( a )) an d 
Theorem 15.11 yields 



1 f°° 

EAD X ^- J f(x)x- 2 dx 



q f°° e~ px — e~ x p r°° <=~ qx 



dx + — dx 



H 7_ oc x H 
—q\np — ping 
~ H ' 

This holds also in the arithmetic case, since a simple calculation shows 
that Fourier coefficients ip{m) in (|5.T[) vanish for all m ^ 0. (This is an 
interesting example of cancellation in an arithmetic case where we would 
expect oscillations.) Hence the expected saving is 1 for p = 1/2, and 0(1) 
for any fixed p. (This is o(ED\) and thus asymptotically negligible.) 

Again, we can depoissonize by considering A = n±n 2//3 , and we obtain the 
same result for a fixed number n of strings. Together with Theorem 13.3^ we 
obtain the following, earlier found by Szpankowski (3lT |. see also |Knuth [22, 
Section 6.3] {p = 1/2) and Rais, Jacquet and Szpankowski [29l |. (Dynamical 
sources are considered by Bourdon [3|].) 

Theorem 7.1. For the expected depth ED^ in a Patricia trie: 

(i) If lnp/lng is irrational, then, as n — > oo, 

P Inn H 2 j + qlnp+plnq 
= IT 2IP H + 

(ii) If Inp/lnq is rational, then, as n — > oo, 

P Inn H 2 j + qlnp + plnq 
E -°n = -g~ + Tyfp + jj + V'lOnn) + o(l), 
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where ipi(t) is a small continuous function, with period d in t, given by 

8. Insertion in a trie 

When a new string is inserted in a trie, it becomes a new external node; 
it may also create one or several new internal nodes. Let N > be the 
number of new internal nodes. 

Theorem 8.1. As n — > oo ; 

P(N = 0) = 1 - ^ - Vs(lnn) + o(l), 
ri 

V(N=j) = {^-+Mlnn)y P q(l-2pqy- 1 + o(l), j > 1, 
where V>3 = in the non-arithmetic case, while in the d-arithmetic case 

^w = ^E r ( 1 -^) e2mfct/d - 

Further, 

EN = 4 + ^^3(lnn) + o(l). (8.1) 
H Ipq 

The same results hold in the Poisson case (with n replaced by X). 

Proof. Consider first the Poisson case, with insertion of H in a trie with 
Po(A) other strings. 

Let K be the length of the longest prefix of 3 that is shared with at 
least two strings already existing in the trie; this is the depth of the last 
internal node (in the existing trie) that the new string encounters while 
being inserted. 

There is either no existing string with the same K + 1 first letters as S, 
or exactly one such string. In the first case, H is inserted at depth K + 1 
without creating any new internal nodes, so N = 0. 

In the second case, we have reached an external node, which is converted 
into an internal node, and the string that was stored there is displaced and 
instead stored, together with the new string, at the end of a sequence of 
N > 1 new internal nodes, where ./V is the number of common letters, after 
the K first, in these two strings. 

Thus, conditioned on iV~ > 1, JV has a geometric distribution: 

V(N = j) =P(JV > l){p 2 + q 2 ) j ~ l -2pq, j > 1. (8.2) 

Since further F(N = 0) = 1 - P(JV > 1), it suffices to find ¥(N > 1). 

For a given k, the event N > 1, K = k and, say, £k+i = 1> happens 
if and only if £,k+i = 1 an d there is exactly one existing string beginning 
with £i • • -^1 and at least one beginning with £i • • -^0. The conditional 
probability of this given a := £i ■ • • is 

P(&+1 = l)P(Po(AP(a) 9 ) > 1) P(Po(AP(a)p) = l) = fi(XP(a)), 
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with 

f\{x) = p(l - e qx )(pxe~ px ) = p 2 xe~ px - p 2 xe~ x . 

Thus, 

P(JV > 1, K = k and i K+ i = 1) = E/i(A P(fi • • • &)) = E /i (Ae" 5fc ) 

= E/ 1 (e-( 5fc - lnA )) 

and, summing over A; and using (|2.11[> . 

°° poo 

P(iV > 1 and = 1) = V E h (e~^ ln A >) = / /i (e^ 111 A >) dU(x). 

The function g\(x) := fi(e~ x ) is directly Riemann integrable on (— oo,oo) 
by Lemma [A. 61 (because fi(x) = 0{x A x^ 1 )), and thus the key renewal 
theorem Theorem IA.7I yields 

i r°° 

F(N > 1 and £ K+1 = 1) = — / gi(x) dx + ^ 3 l(ln A) + o(l). (8.3) 
where ip3i(t) = in the non-arithmetic case and 

= 4 E ?i(-2™/d)e™/ d (8.4) 



H 



in the arithmetic case. 

Routine integrations yield 



/ gi(x)dx= [ fi(y)— - = [ (p 2 e py -p 2 e y )dy=p-p 2 =pq 

j-oo Jo v Jo 

(8.5) 

and, more generally, 

POO f'OO 

9i(s) = / e- isx 9l {x)dx = / h{y)y' s - l dy = (p 1 ^ - p 2 )T(l + is); 



thus in the arithmetic case, since p 2nim / d = 1 for integers m, 

gi{-2iim/d) = pqT(l - 2irmi/d). (8.6) 
By symmetry, (j8.3j) implies, for similarly defined go an d ipo, 

1 r°° 

P(N > 1 and Ck+i = 0) = — / g {x) dx + V> 3 o(m A) + o(l), (8.7) 

H J-oo 

where, noting that (|8.5j) and (18.6H are symmetric in p and (/, 50 (a?) drr = 
and ^30 = ^31 • 

Consequently, summing (|8.3p and (|8. 70 . with ^3 := ^30 + ^31 = 2^31, 



P(iV>l) = ^ + V 3 (lnA) + (l). 
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The result in the Poisson case now follows from (|8.2p , (|8.4j) , (18. 6b and (|8.8I) . 
For the mean we have by (18. 2f) and (j8.8f) . 

OO 1 1 

EN = ^jF(N = j) = —F(N >l) = - + ^3(lnA) + o(l). 

To depoissonize, consider first adding E to a trie with Po(n — n 2 / 3 ) strings, 
and then increase the family by adding Po(n 2 / 3 ) further strings; it is easily 
seen that with probability 1 — 0(A -1 / 3 ) = 1 — o(l), this does not change 
the place where E is inserted, and thus not N. The same holds for all 
intermediate tries, in particular for the one with exactly n strings if there is 
one, which there is w.h.p. because P(Po(n - n 2 / 3 ) < n) -► 1 and P(Po(n + 
n 2 / 3 ) > n) — ► 1. Hence the variable N is w.h.p. the same for n strings and 
for Po(n) strings. □ 

It is easily verified that, at least if we ignore the error terms, the expected 
number of new internal nodes added for each new string given by (|8.ip 
coincides with the derivative of EW> = + -^^(hr A) + o(A) given by 
(|5.13p . as it should. 

Remark 8.2. Christophi and Mahmoud [4| studied random climbing in 
random tries, taking (in one version) steps left or right with probabilities 
p and q; this is like inserting a new node but without moving any old one. 
The length of the climb is thus D n when N = or 1 but D n — (N — 1) when 
N>1. 

The average climb length found by Christophi and Mahmoud 0] for this 
version thus follows from Theorems 13.31 and 18.11 



9. TUNSTALL AND KHODAK CODES 

Tunstall and Khodak codes are variable-to-fixed length codes that are 
used in data compression. We give a brief description here. See 0], @] and 
the survey [33| for more details and references, as well as for an analysis 
using Mellin transforms. 

We recall first the general situation. The idea is that an infinite string 
can be parsed as a unique sequence of nonoverlapping phrases belonging to 
a certain (finite) dictionary T>. Each phrase in the dictionary then can be 
represented by a binary number of fixed length £; if there are M phrases in 
the dictionary we take i := [lgM]. 

Note first that a set of phrases is a dictionary allowing a unique parsing 
in the way just described if and only if every infinite string has exactly one 
prefix in the dictionary. Equivalently, the phrases in the dictionary have to 
be the external nodes of a trie where every internal node has two children 
(so the Patricia trie is the same); this trie is the parsing tree. 

By a random phrase we mean a phrase distributed as the unique initial 
phrase in a random infinite string E. Thus a phrase a in the dictionary T> is 
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chosen with probability P(ol). We let the random variable L be the length 
of a random phrase. 

If we parse an infinite i.i.d. string H, the successive phrases will be inde- 
pendent with this distributions. Hence, if Kn is the (random) number of 
phrases required to code the N first letters £1 • • ■ S,N, then, see Appendix lAl 
and (12. 8j) . K~n = v{N — 1) for a renewal process where the increments Xi 
are independent copies of L. Consequently, as N — > oo, by Theorem IA. 1 1 

&-J_ ^ M»^J_. (9.!) 
iV EL iV EL v ; 

We obtain also convergence of higher moments and, by Theorem IA.31 a 
central limit theorem for Ljv. The expected number of bits required to code 
a string of length N is thus 

IN fig Ml 

For simplicity, we consider the ratio k := lg M / E L, and call it the compres- 
sion rate. (One objective of the code is to make this ratio small.) 

In Khodak's construction of such a dictionary, we fix a threshold r £ (0, 1) 
and construct a parsing tree as the subtree of the complete infinite binary 
tree such that the internal nodes are the strings a = a± ■ ■ ■ ctk with P(ol) > r; 
the external nodes are thus the strings a such that P(cx) < r but the parent, 
a' say, has P(ol') > r. The phrases in the Khodak code are the external 
nodes in this tree. For convenience, we let R = 1/r > 1. Let M = M{R) be 
the number of phrases in the Khodak code. 

In Tunstall's construction, we are instead given a number M. We start 
with the empty phrase and then iteratively M — 1 times replace a phrase ot 
having maximal P(a) by its two children aO and cel. 

It is easily seen that Khodak's construction with some r > gives the 
same result as Tunstall's with M = M{R). Conversely, a Tunstall code is 
almost a Khodak code, with r chosen as the smallest P{ot) for a proper prefix 
a of a phrase; the difference is that Tunstall's construction handles ties more 
flexibly; there may be some phrases too with P(ol) = r. Thus, Tunstall's 
construction may give any desired number M of phrases, while Khodak's 
does not. We will see that in the non-arithmetic case, this difference is 
asymptotically negligible, while it is important in the arithmetic case. (This 
is very obvious ifp = q = 1/2, when Khodak's code always gives a dictionary 
size M that is a power of 2.) 

Let us first consider the number of phrases, M = M(R), in Khodak's con- 
struction with a threshold r = 1/R. This is a purely deterministic problem, 
but we may nevertheless apply our probabilistic renewal theory arguments. 
In fact, M, the number of leafs in the parsing tree, equals 1 + the number 
of internal nodes. Thus, M = 1 + f(RP(a)) with f(x) := l[x > 1], and 
we may apply Theorem 15.11 
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Theorem 9.1. Consider the Khodak code with threshold r = 1/R. 

(i) // \np/\nq is irrational, then, as R — > oo, 

M(R) 1 
R > H' 

(ii) If Inp/lnq is rational, then, as R — » oo, 
M(R) _ 1 d _ e -d{(lnR)/d} +Q ^y 



R H 1 - e 



Proof. The non-arithmetic case follows directly from Theorem I5,lf i). since 
Jo°° f( x ) x ~ 2 = x~ 2 dx = 1. 

In the arithmetic case, we use (PTBJI . Since #(i) = e*l[i < 0], the sum m 
(|5.8[) is a geometric series that can be summed directly: 

= d £ e kd - f = -^-_ e d[t/d}-t = _J_ e -d{t/d} n 

kd<t 6 6 

Remark 9.2. In the arithmetic case (ii), lnP(a) is a multiple of d for any 
string a. Hence M(R) jumps only when R G {e kd : k > 0}, and it suffices 
to consider such R. For these R, the result can be written 

M(R) ~ -1 d d R, InR G dZ. (9.2) 

Next, consider the length L of a random phrase. We will use the notation 
L7 M for a Tunstall code with M phrases and L 1 ^ for a Khodak code with 
threshold r = 

Consider first the Khodak code. By construction, given a random string 
3 = £l£2 ■ ■ ■ > t ne fi rs t phrase in it is £i • • ■ £ n where n is the smallest integer 
such that P(£i ■••£„) = e~ 5 ™ < r = e~ lrijR . Hence, by (l2T8|) . 

L^ = i/(lnP). (9.3) 

Hence, Theorems IA.lHA.3l immediately yield the following (as well as con- 
vergence of higher moments) . 

Theorem 9.3. For the Khodak code, the following holds as R — > oo, with 
a 2 = H 2 - H 2 = pq\n 2 {p/q): 



Ij r a.s 1 



InR H' 



(9.4) 



4~AsN(^,-JlnP), (9.5) 



T 2 



VarL^ ~ -^3 In P. (9.6) 
// lnp/lng is irrational, then 

EL£ = ^ + i^ + 0( l). (9.7) 
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If lnp/lng is rational, then, with d := gcd(lnp, lng) given by ()2.6f) 
InR Ho. d (\ rlni? 

J R 



u mii Ho a / L rlnitn 
Erf-— + 2^+h(2-{^-})+°W- P- 



In the arithmetic said in Remark l9.2l it suffices to consider thresh- 

olds such that — In r = In R is a multiple of in this case (|9.8p becomes 

„ [<• In R Ho d , . . . 

We analyze the Tunstall code by comparing it to the Khodak code. Thus, 
suppose that M is given, and increase R (decrease r) until we find a Kho- 
dak code with M(R) > M phrases. (By our definitions, M(R) is right- 
continuous, so a smallest such R exists.) Let M + := M{R) > M and M_ := 
M(R-) < M. Thus, there are M + — 1 strings a with P(a) > r = R , and 
M_ — 1 strings with P(a) > r; consequently there are M + — M_ strings 
with P(a) = r. The strings with P{a) = r are not parsing phrases in the 
Khodak code (while all their children are) , but we use some of them in the 
Tunstall code to achieve exactly M parsing phrases. Since each of these 
strings replaces two parsing phrases in the Khodak code, the total number 
of parsing phrases decreases by 1 for each used string with P(a) = r, and 
thus the Tunstall code uses M(R) — M = M + — M parsing phrases with 
P(a) = r. The length l7 M of a random phrase, realized as the first phrase 
in S, equals unless S begins with one of the phrases a in the Tunstall 
code with P(a) = r, in which case l7 M = — 1. The probability of the 
latter event is evidently P(a) = r for each such a, and is thus (M(R) — M)r. 
Consequently, with R as above, 

L T M = L\ - A M , (9.10) 

where A M G {0, 1} and P(A M = 1) = (M(R) - M)/R. We can now find 
the results for L M : 

Theorem 9.4. For the Tunstall code, the following holds as M — > oo, with 
a 2 = H 2 - H 2 =pqln 2 (p/q): 



L M a.s 1 



(9.11) 

InM), (9.12) 



InM H' 
tT a ^(^M a 2 

2 

VavLjj ~ InM. (9.13) 
If lnp/\nq is irrational, then 

EL ^ = ~H" + _ H" + 2^ + (9 - 14) 
If Inp/lnq is rational, then, with d := gcd(lnp, lng) given by (|2.6|) . 

j InM InH H 2 1 sinh(d/2) 
ELm = ~1T + -IT + 2H 2 + H lQ ^JT- 
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d /(lnM + ln(H(l-e- d )/d))\ . . . 
e rfx — 1 

V»4(x) :=^— T -x. (9.16) 
e a — 1 

Note that ^ is continuous, with -04(0) = ^4(1) = 0- "04 is convex and 
thus ip4 < on [0,1]. In the symmetric case p = q = 1/2, d = iT = ln2 and 
^a{x) = 2 X — 1 — x, with a minimum —0.086071 .... 

Proof. Let as above i? be the smallest number with M(R) > M; thus 
M(J2) > M > M(R-). By Theorem EH lni? = InM + O(l), so fl£HJ|- 
([933]) follow from (jO]) - ([9l)]) and the fact that \L y M - L%\ < 1, see (l9~10j) . 

If lnp/lng irrational, Theorem 19.11 yields M(R)/R — > 1/H, and thus also 
M{R-)/R -» l/H. Since Af(fl) > M > M(R-), also 

Ml , 
R^W (9 ' 17 > 
and further M{R)/M — > 1. Consequently, 

M(R)-M (M(R) \M 



and thus, by flQjfr , E = E - E A M = E + o(l). Since also, by 
(pTfj) again, lni? = InM + InH + o(l), (l9l4l) follows from (ETTI) . 

In the case when lnp/lng is rational, we argue similarly, but we have 
to be more careful. First, necessarily R = e Nd for some integer N, see 
Remark 19.21 Further, (|9.2j) applies. Let, for convenience, 

0: __ H ^ H »mm e ^ (9 , 8) 

thus ([93]) can be written M(fl) ~ P^R as i? -> oo. Let 

1 1 /?M 

x := -In(pM) -N + l = 3 hi -— - + 1. (9.19) 
d d R 



Then, by these definitions and (|9.2p . 

M = I3~ l e d{ - N - 1+X \ (9.20) 
M{R) = p~ l R(l + o(l)) = p-^ e dN+o(i) } (9 21) 

M(J2-) = M{Re~ d ) = p-^Re-^il + o(l)) = /rV (Ar - 1)+o(1) . (9.22) 

Since M(R-) < M < M(R), we see that o(l) < x < 1 + o(l). We define 
also, using (|9.20p . 

I0 := _ l"" 1 ""; 1 ^} = {,}. (9.23) 

Typically, < x < 1, and then xo = x, but it may happen that x is slightly 
below and xq = x + 1, or that x is slightly above 1 and then xq = x — 1. 
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By (19391) . InR = ln(/?M) + d{\ - x), and thus ([93]) yields, using (193H1) . 
K ln(/?M) H 2 d d . 

E£ K = ^|_J + _2_ + _ + _ (1 _ l)+o(1) 

InM lni7 1 , sinh(d/2) H 2 d , 
Furthermore, by R = e dN , (15301) . (1QT|) and (19J8D . 

Ei „ = !!M = rl|1 _«*-., )+ , (1 , 

d l-e xd - d , , d ( e xd -l\ 

Combining these, we find by (|9.10p and (j9.16p . 

EL T M =e4-EA m 

InM mi? 1 , sinh(d/2) H 2 d , , , 
= ~1T~ + IT + if 1/2 + 2}P + fl*^' + "W' 

This is almost ()9.15p . except that there ij)±{x) is replaced by ip^xo) = 
ip<i({lri(f3M)/d}), see (I9.23p . However, as noted above, x ^ xo can happen 
only when one of x and xo is o(l) and the other is l+o(l). Since the function 
ip4 is continuous and V>4(0) = ^4(1)1 we see that in this case ip^x) — ip^x^) = 
±(^4(1) — ^4(0)) + o(l) = o(l). Hence, ip4,(x) = ^4(^0) + o(l) in all cases, 
and (pT5|) follows. □ 

Remark 9.5. We have chosen to derive Theorem [93] from the corresponding 
result Theorem 19.31 for the Khodak code. An alternative is to note that in 
the Tunstall code, we obtain the random phrase length l7 M by stopping 
3 at M_|_ — M of the M + — M_ strings a with P{a) = r, and all strings 
with smaller P(a). By symmetry, we obtain the same distribution of the 
length if we stop randomly with probability {M + — M)/(M + — M-) whenever 
P(a) = e~ Sn = r; equivalently, we stop when e~ Sn ~ x ° < r, where Xq is a 
random variable, independent of H, with values and e, for some very small 
positive e = e(M), and P(X = e) = (M+ - M)/(M+ - M_). Consequently, 

we have l7 M = z/(ln R) , with R and Xq as above, and we can apply Theorems 
IA.1HA.3I (and Remark IA.4|) directly. 

Corollary 9.6. The compression rate for the Tunstall code is 

ELjj In 2 V InM VV ; > ) 

where 5 = when hip/ In 5 is irrational while when lnp/lng is rational, 
. , sinh(ci/2) 1( /rlnM + ln(#(l -e~ d )/d)^\ 

S:=m-^l + ^{{ 4 ^}), 

with d given by (|2.6p and 7/24 6y (|9.16p . 
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For the Khodak code, the compression rate lg(M(i2))/EL^ is asymptoti- 
cally given by the same formula, with In M replaced by In R, except that the 
-04 term does not appear in 5. 

The reason that the tp^ term does not appear for the Khodak code is that 
= L\ftfmi an d in the arithmetic case, we may assume that R = e Nd , 
and then for L T M , R y the argument xq of ifi is {ln(P M(R))/d} = {\n(R)/d + 
= {N + o(l)} and thus close to or 1, where ^4 vanishes. 

10. A STOPPED RANDOM WALK 

Drmota and Szpankowski [§] consider (motivated by the study of Tunstall 
and Khodak codes) walks in a region in the first quadrant bounded by two 
crossing lines. Their first result, on the number of possible paths, seems to 
require a longer comment, and will not be considered here. Their second 
result is about a random walk in the plane taking only unit steps north or 
east, which is stopped when it exits the region; the probability of an east 
step is p each time. Coding steps east by 1 and north by 0, this is the same 
as taking our random string H. Drmota and Szpankowski [9] study, in our 
notation, the exit time 

Dr,v := min{n : n > k or S n > V In 2} 

for given numbers K and V, with K integer. We thus have 

D Ky = {K + 1) f\v{V\n2). (10.1) 

We have here kept the notations K and V from (s], but for convenience 
we in the sequel write V2 '■= Vln2. We assume p 7^ q, since otherwise 
Dk,v = (K A L^J ) + 1 is deterministic. 

We need a little more notation. Let as usual 4>(x) := (2it)~ l / 2 e~ x I 2 
and $(x) := (j>(y) dy be the density and distribution functions of the 
standard normal distribution. Further, let 



y&(x) ■- <5>(y)dy = x<S>(x) + <f)(x). (10.2) 
J —00 

This definition is motivated by the following lemma. 

Lemma 10.1. If Z ~ N(0, 1), then for every real t, E(Z V t) = *(t) and 
E(Z At) = -¥(-*). Further, 9(t) - 9(-t) = t. 

Proof. Since E Z = 0, 

/'OO f'OO 

E(ZVt)=E(ZVt-Z)= P(Z Vt - Z > x)dx = \ <£>(t-x)dx 

Jo Jo 



Further, since — Z = Z 



*(*)• 

! -Z-- 

E{Z At) = E((-Z) V (-t)) = E(Z V (-t)) = 
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Finally, *(t) - *(-t) = E((Z V t) + (Z A t)) = E(Z + t) = t. (This also 
follows from (fT02l) and + = 1, <j>{-t) = <f>(t).) □ 

We can now state our version of the result by Drmota and Szpankowski 
[§]. We do not obtain as sharp error estimates as they do (although our 
bounds easily can be improved when \K — V 2 /H\ is large enough). On the 
other hand, our result is more general and includes the transition region 
when V 2 /H ~ K and both stopping conditions are important. 

Theorem 10.2. Suppose that p 7^ q and that V, K — > 00. Let V2 := Fin 2 

and a 2 := (H 2 - H 2 )/H 3 > 0. 

(i) If {K — Vij I \JV2~ — ► +00, then Dxy is asymptotically normal: 

D K y ~ AsNf-J,or 2 F 2 y (10.3) 



H 

Further, Vai(DKy) ~ a 2 ^. 

(ii) If (K —V 2 / H) /y/Vz — > — 00 , £aen Dxy is asymptotically degenerate: 

F(D K y = K + 1) -» 1. (10.4) 

Further, VarD = o(T^). 

(iii) 7/ {K—Vi/H^/y/Vz—* 0, £ (— 00,+ooJ, then D^y is asymptotically 
truncated normal: 

V 2 1/2 {D K y - V 2 /H) -±4 (aZ) A a = ct(Z A (a/a)) . (10.5) 

imi/i Z ~ JV(0, 1). Further, 

Y&r{D K y) ~ ^2 Var(aZ A a) = y 2 5 2 Var(Z A (a/a)). 

(iv) 7n ewer?/ case, 

EDjr.v = ^-ZVv 2 y( V2/H / ^ r K )+o(VV2) (10.6) 
-H v cry ^2 7 

" (10.7) 



^-sV^^^)+o(v^). 

V cr v ^2 ' 



(v) // (K - V 2 /H)/ x /V2 > In V 2 , then 

ED K>v = ^ + ^+^(y 2 ) + o{l), (10.8) 

where ^5 = in i/ie non- arithmetic case and ^>s(i) = (l/2 — {t/d}) m i/ie 
d- arithmetic case. 

(vi) // (if - F 2 /#)/V^ < —In F 2 , ften 

E% = K + l + o(l). (10.9) 

Proof. Let 

~ = g^y - y 2 /g _ i/(y 2 ) - f 2 /# 



v 2 vv 2 

K := K ~^ H , K\ := = K + o(l). 



RENEWAL THEORY IN ANALYSIS OF TRIES AND STRINGS 



25 



Thus, by (fiTTTD . D = v A K x . By Theorem OJ 
_ i/(y 2 ) - V 2 /# 



iV(0,^). 



(10.10) 



The results on convergence in distribution in (i) - (iii) follow immediately, 
noting that in (i) , w.h.p. u{V 2 ) < K + 1 and thus Dk,v = ^(^2); m (hi) we 
use e.g. the continuous mapping theorem on A : R 2 — > R. 

For (iv) , note first that the two expressions in (|10.6p and fj 10. Tj) are the 
same by Lemma 110. 11 We may by considering subsequences assume that 
one of the cases (i) - (iii) occurs. 

Next, (|A.9|) can be written E(v 2 ) — ► a 2 , which together with (jlO.lOp im- 
plies that v 2 is uniformly integrable. (See e.g. [12 . Theorem 5.5.9].) In 
case (iii) , when K\ converges, this implies that D 2 = (17 A K\) 2 also is uni- 
formly integrable, and thus the convergence in distribution already proved 
for pi)1 implies ED -> E(a(Z A {a/a))) = -a^(-a/a), which yields ([106]) 
when K — ► a 6 R; further, the uniform square integrability of D 2 implies 
Vav(aZ A a) as asserted in 



VarL> 

If instead K\ 



in. 



(i) , we may assume K\ > 0; then D = {y A 



+oo, case 

K\) 2 < i7 2 and thus D 2 is uniformly integrable in this case too. Hence (j!0.3j) 
implies both Var(D) ~ a 2 , or equivalently Var-D^v ~ & 2 V 2 as asserted in 

0. 



(i}J and E D 
Finally, if K x 



y ~ a V2 

0, which yields (|10.6p in this case because ^f(—K) 

do, case (ii) , we may assume that K\ < 0; then K\ — 
D = (K\ — z?) + < \u\ is uniformly square integrable, and K\ — D — — > by 
(fTTCT . Hence #i - EZ? = E{K X - D) -»■ 0, and thus (fiaTl) holds, since 
*(iT) and 1 = o( v / Vi). Further, VarZ? = Yax{K x - D) 0, which 
yields VarD = o(V^). 



This completes the proof of (iv 



For (v) , we have Dk,v — v(V 2 ) and thus, by the Cauchy-Schwarz inequal- 
ity and Theorem lA.il 



E \D K ,v ~ v(V 2 )\ < E[v{V 2 )l[D K>v ± u(V 2 )}) 



< (Eu(V 2 



>{D K y + u{V 2 )) 



1/2 



0(V 2 )F(D Ky ^u(V 2 )) 



1/2 



(10.11) 



For K > lnV^, Chernoff's bound [20j, Theorem 2.1] implies, because Sk+i 
is a linear transformation of a binomial Yi\{K + l,p) random variable, 

F(D K>V ± i/(F 2 )) = F(u(V 2 ) >K+1) = F(S K+1 < V 2 ) 



= ns K+ i 

< exp^ 



ci- 



ES^+i < -M r 



K 2 V2 



K + 1 + K x 
< exp(-c 2 ln 2 (V 2 )). 
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for some c±,C2 > (depending on p); the last inequality is perhaps most 
easily seen by considering the case K + 1 < 2V2/H (when K + 1 X V2) and 
K + 1 > 2V2/H (when K\ X K/y/Vi) separately. Hence, the right-hand 
side of (jlO.lip tends to 0, and thus KDk,v = Ef (V2) + o(l). Consequently, 



(v) follows from the formulas (|A.3P and (|A.5P for Ez/(T^) provided by The- 



orem [A]2j 

The argument for (vi) is very similar. The Chernoff bound for Sk implies 
F(D Ky ^K + l) = F(u(V 2 ) <K + 1)= F{S K > V 2 ) < exp(-c 3 ln 2 (y 2 )), 
and the Cauchy-Schwarz inequality then implies E \K + 1 — Dk,v\ = o(l), 



proving (vi) . □ 
Appendix A. Some renewal theory 

For the readers' (and our own) convenience, we collect here a few standard 
results from renewal theory, sometimes in less standard versions. See e.g. 
Asmussen [l|], Feller [ill ] or Gut [13] for further details. 

We suppose that Xi,X 2 ,... is an i.i.d. sequence of non-negative ran- 
dom variables with finite mean /j := EI > 0, and that S n := 5_/i=i-^i- 
Moreover, we suppose that Xq is independent of X\^X 2 , - ■ ■ (but Xq may 
have a different distribution, and is not necessarily positive) and define 
S n := ^^Lq Xi = S n + Xq. We further define the first passage times v(t) 
and u(t) by ([275]) and (p7T3]) and the renewal function U by (pHD]) . (Recall 
that v is a special case of v with = 0. Hence the results stated below for 
v hold for v too.) 

For some theorems, we have to distinguish between the arithmetic (lattice) 
and non-arithmetic (non-lattice) cases, in general defined as follows: 

arithmetic (lattice): There is a positive real number d such that 
X\/d always is an integer. We let d be the largest such number and 
say that X\ is d- arithmetic. (This maximal d is called the span of 
the distribution.) 

non-arithmetic (non-lattice): No such d exists. (Then X\ is not 
supported on any proper closed subgroup of M.) 



Theorem A.l. As t — > 00, 

v(t) a .s. 1 



(A.l) 



t fi 

If further < r < 00 and E|Xo| r < 00, then v(t)/t — > in L r , i.e., 
E \v(t)/t — /i~ 1 | r — ► ; and thus 



Proof. See e.g. iGutl [ij, Theorem 2.5.1] for the case Xq = 0; the general case 



follows by essentially the same proof. □ 
Theorem A. 2. Suppose that EXf < 00 and E \Xq\ < 00. 
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(i) If the distribution of X\ is non- arithmetic, then, as t — > oo 

t EX? 

and, more generally, 

t EXf EX 
H 2fi 2 fj, 

(ii) If the distribution of X\ is d-arithmetic, then, as t — > oo 



Ei/(i) = ~ + -^ + o(l) (A.3) 



EP(i) = — + -7777^- — + o(l). (A.4) 



and, more generally, 

Proof. See e.g. Gut] [l~3l Theorem 2.5.2] for the case Xo = 0; the general 
case follows easily by conditioning on Xo- In the arithmetic case, note 
that V(t) = u(t - X ) = v([(t - X )/d\d) and use E(|_(t - X )/d\d) = 
t-EX - dE{(t- X )/d}. ' □ 

Theorem A.3. Assume that a 2 := VarXi < oo. Then, as t — > oo, 

S!L^*(o,£). ,A,, 

If further a 2 > 0, this can be written v ~ AsN(/i -1 t, o~ 2 fi~ 3 t). 
Moreover, if also EXq < oo, then 

2 

Vax(P(t)) = ^rt + oft); (A.8) 

and 

9 a 2 

E(v(t)-t/ii) =-^t + o(t). (A.9) 

Proof. See e.g. I Gut Theorem 2.5.2] for the case Xq = 0, noting that 
(jA.8j) and (1A.9[> are equivalent because E£(t) -t//i = O(l) by TheoremE21 
again, the case with a general Xo is similar, or follows by conditioning on 
Xo. The case a 2 = is trivial. □ 

(n) 

Remark A.4. We can allow Xo = Xq to depend on n in Theorems 
IA.lHA.3l provided is weakened to — — > in (|A.1|) and we add the fol- 
lowing uniformity assumptions: X^ is tight; for L r convergence and (ATZb 
we further assume that sup n E|X<j n) | r < oo; for Theorem D we assume 
that Xg are uniformly integrable; for (|A.8j) and (|A.9P we assume that 
sup„E|X^ n) | 2 < oo. 
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For the evaluation of (|A.6jl when Xq is non-trivial, we note the following 
formula. 

Lemma A. 5. Suppose that X has a continuous distribution with finite 
mean, and a characteristic function (pit) := Ee ltx that satisfies (pit) = 
Odtj" 5 ) for some 5 > 0. Then, for any real u, 



E{X + u} = l-Y, 



2 ^ 2vrni 

Proof. Let X u := [X + u\ - u + 1. Then {X + u} = X - X u + 1, and the 
result follows from the formula for E A\ in 

[H, Theorem 2.3]. □ 

For the next theorem (known as the key renewal theorem), we say that a 
function / > on (— oo, oo) is directly Riemann integrable if the upper and 

lower Riemann sums E*L-oo' lsu P[(*--i)/i,«>) / and Efcl-oo ^^(fc-ijp^Z 
are finite and converge to the same limit as h — ► 0. (See further iFellerl 



111 . Section XI. 1]; Feller considers functions on [0,oo), but this makes no 



difference.) For most purposes, the following sufficient condition suffices. 
(Usually, one can take F = f.) 

Lemma A. 6. Suppose that f is a non-negative function on (— oo, oo). Iff is 
bounded and a.e. continuous, and there exists an integrable function F with 
< / < F such that F is non- decreasing on (— oo, — A) and non-increasing 
on (A, oo) for some A, then f is directly Riemann integrable. 

Sketch of proof . It is well-known that the boundedness and a.e. continu- 
ity implies Riemann integrability on any finite interval [—B,B]. Using the 
dominating function F, one sees that the tails of the Riemann sums coming 
from intervals [(k — l)h, kh) outside [-B, B] can be made arbitrarily small, 
uniformly in h £ (0, 1], by choosing B large. □ 

Theorem A. 7. Let f be any non-negative directly Riemann integrable func- 
tion on ( — oo, oo). 

(i) If the distribution of X\ is non- arithmetic, then, as t — > oo, 

/ f( 3 -t)dU(a)->- f(s)ds, (A.10) 

JO M J-oo 

f'OO 1 f'OO 

/ f(t- 8 )dU(8)->- f(s)ds. (A.ll) 

JO f 1 J -co 

(ii) If the distribution of Xi is d- arithmetic, then, as t — » oo, 

f°° f(s - t) dU(s) = -m + o(l), (A.12) 
Jo V 

OO j 

f(t-s)dU(s) = -i>(-t)+o(l), (A.13) 

o 
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where ip(t) is the bounded d-periodic function 

oo 

m :=d f(kd-t); (A.14) 
ip has the Fourier series 

il>(t) ~ Yl $(m)e 2nimt / d (A.15) 



fc=— oo 



with 



POO 

$(m) = f(-2irm/d) = / e 2n[mt/d f(t) dt. (A.16) 



In particular, the average of ip is ^(0) = f. The series (|A.14|) con- 
verges uniformly on [0,d]; thus tp is continuous if / is. Further, if / is 
sufficiently smooth (an integrable second derivative is enough), then the 
Fourier series (|A.15P converges uniformly. 

Proof. The two formulas (|A.10|) and (|A.11|) are equivalent by the substitu- 
tion fix) — > f{—x). The theorem is usually stated in the form (|A.11|) for 
functions / suppo rted on [0, oo); then the integral is f Q f(t — s) dU (s). How- 



ever, the proof in iFellerl [111 . Section XI. 1] applies to the more general form 
above as well. (The proof is based on approximations with step functions 
and the special case when f(x) is an indicator fuction of an interval; the 
latter case is known as Blackwell's renewal theorem.) In fact, a substan- 
tially more general version of (lA.llj) . where also the increments may 
take negative values, is given in [2|, Theorem 4.2]. 

Part (ii) follows similarly (and more easily) from the fact that the mea- 
sure dU is concentrated on {kd : k > 0}, and thus J Q f(s — t)dU(s) — 
jii>(t) = YlkL-oc, f[kd ~ t)(dll{kd} — d/fj,) together with the renewal the- 
orem dll{kd} — d/fi^O&sk^oo. The Fourier coefficient calculation in 
(|A.16|) is straightforward and standard. □ 

Finally, we consider a situation where we are given also another sequence 
Y\,Y2, ... of random variables such that the pairs (Xi,Yi), i > 1, are i.i.d., 
while Yi and Xi may be (and typically are) dependent on each other. (1^ 
need not be positive.) We denote the means by fix '■= EXi and fly := 
EYi; thus fix = ^ in the earlier notation, and we assume as above that 
< fix < °o. We also suppose that Xq is independent of all (Xi, Yi), i > 1. 
LetK:=£r=i^- 

Theorem A. 8. Suppose that o~\ := Var X\ < oo and o~\ := VarYi < oo, 
and let 

a 2 := Vav(fi x Yi - fiyXi). 

Then 

V ^ ~ Si _!U N ( °L) 
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// a 2 > 0, this can also be written as 



Vt 



AsN 



em, °-t 



Note that the special case Yi = 1 yields (|A.7|) . 



Proof. For X = 0, and thus u{t) = v(t), this is [Gut! [13|, Theorem 4.2.3]. 
The general case follows by the same proof, or by conditioning on Xq. □ 

(n) 

Remark A. 9. Again, we can allow Xq = Xq to depend on n, as long as 
the X { Q n) is tight. 
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